Goto

Collaborating Authors

 defense method


Constructing Unrestricted Adversarial Examples with Generative Models

Neural Information Processing Systems

Adversarial examples are typically constructed by perturbing an existing data point within a small matrix norm, and current defense methods are focused on guarding against this type of attack. In this paper, we propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model, without being restricted to norm-bounded perturbations. We first train an Auxiliary Classifier Generative Adversarial Network (AC-GAN) to model the class-conditional distribution over data samples. Then, conditioned on a desired class, we search over the AC-GAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that these new kind of adversarial images, which we call Generative Adversarial Examples, are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that generative adversarial examples can bypass strong adversarial training and certified defense methods designed for traditional adversarial attacks.






Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing Systems

However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.


c24cd76e1ce41366a4bbe8a49b02a028-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers [R1, R2, R3] for their valuable and constructive comments. We are glad that all of them1 appreciate the novelty of the proposed metric learning approach for adversarial robustness.